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Abstract 

We consider the problem of optimal probing of states of a channel by transmitter and receiver for maximizing rate of reliable 
communication. The channel is discrete memoryless (DMC) with i.i.d. states. The encoder takes probing actions dependent on the 
message. It then uses the state information obtained from probing causally or non-causally to generate channel input symbols. The 
decoder may also take channel probing actions as a function of the observed channel output and use the channel state information 
thus acquired, along with the channel output, to estimate the message. We refer to the maximum achievable rate for reliable 
communication for such systems as the 'Probing Capacity 7 . We characterize this capacity when the encoder and decoder actions 
are cost constrained. To motivate the problem, we begin by characterizing the trade-off between the capacity and fraction of 
channel states the encoder is allowed to observe, while the decoder is aware of channel states. In this setting of 'to observe or 
not to observe 7 state at the encoder, we compute certain numerical examples and note a pleasing phenomenon, where encoder 
can observe a relatively small fraction of states and yet communicate at maximum rate, i.e. rate when observing states at encoder 
is not cost constrained. 

Index Terms 

Actions, Channel with States, Cost Constraints, Gel'fand-Pinsker Channel, Probing Capacity, Shannon Channel, To observe 
or not to observe. 



I. Introduction 

Shannon showed the importance of availability of channel state at the encoder for communication system in his seminal paper 
(T), where he computed capacity of DMC with i.i.d. states available causally to the encoder. This spawned an active research 
in the area of channel coding and was extended to various scenarios, notably for storage in computer memory. Kuznetsov 
and Tsybakov in Q constructed defect-correcting codes for coding in computer memory with defective cells. Gel'fand and 
Pinsker in |3'|, extended work in U to the case where channel states are available non-causally to the encoder, again with 
applications for computer memories, which was further researched by Heegard and El Gamal in [4J. Keshet, Steinberg and 
Merhav presented a detailed survey in (5) on channel coding in the presence of state information, where the channel state 
information (CSI) signal is available at the transmitter (CSIT) or at the receiver (CSIR), or both. 

Permuter and Weissman introduced the notion of actions in source coding context in Q. Their setting is a generalization of 
the Wyner-Ziv source coding with decoder side information problem (Q), where now the decoder can take actions based on 
the index obtained from the encoder to affect the formation or availability of side information. Weissman, in [8], studied the 
channel coding dual where the transmitter takes actions that affect the formation of channel states. This framework captures 
various new coding scenarios which include two stage recording on a memory with defects, motivated by similar problems in 
magnetic recording and computer memories. Kittichokechai et al in [9| studied a variant of the problem in [6| and [8|, where 
encoder and decoder both have action dependent partial side information. However, in the source coding formulation of J5), 
they restricted the actions to be taken by decoder while in the channel coding scenario of |8| and |9|, actions were taken only 
by the encoder. 

In this paper, we revisit channel coding scenarios but now cost constrained actions are taken to acquire any partial or complete 
channel state information by the encoder, the decoder or both. Our framework is aimed at capturing and understanding the 
trade offs involved in natural scenarios where the acquisition of channel state information is associated with expenditure of 
costly system resources. The encoder and decoder actions are cost constrained creating tension between achievable rate and 
the cost of acquisition of the channel state (or the defect) information. Note that our framework differs from those of and 
[9| where actions affect the channel, followed by channel encoding. In our scenario channel statistics are not affected, i.e., 
nature generates the state sequence i.i.d ~ P5. Our work is novel in the sense that not only the encoder but the decoder also 
takes actions to acquire channel state information. Encoder takes actions (A e ) depending on messages. Decoder also takes 
actions (Ad) depending upon observed channel output. Using their respective actions, encoder and decoder observe partial 
states, S e and Sd through discrete memoryless channel (DMC), Ps c .s d \s,A e ,A d - The encoder can causally or non-causally use 
its partial state information to generate the channel input symbols. In this paper, we characterize the fundamental limit of such 
a framework and call it Probing Capacity. When the actions are not taken by the decoder, there is an equivalence between 



our setting and that of channels with action dependent states as in [8], which we make explicit in Section III 

The rest of the paper is organized as follows. We begin with a motivating scenario in Section [II] where decoder knows 
the complete state and the encoder takes message dependent binary actions to observe or not to observe the channel state. 
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This is generalized in Section |TTTJ when only encoder takes actions. This section also establishes the equivalence between 
our framework of optimal probing and that of channels with action dependent states in |8|. Motivated by the framework of 
communication over slow fading channels, where the information of channel states is to be exploited on the fly, we have in 
Section IV characterization of the probing capacity where encoder takes actions to get channel states and use them causally 
to construct channel inputs and decoder takes actions strictly causally dependent on channel outputs. Note that in this section, 
we characterize a novel and a generalized setting, where both encoder and decoder take costly actions to get channel state 
information. Later in this section, inspired by coding on computer memory with defects, we explain the non-causal case, i.e., 
when channel states are used non-causally by the encoder to generate channel input symbols and decoder waits for the entire 
channel output before taking actions to get channel states. This in general is a hard problem and we show its equivalence to a 
standard relay channel with infinite lookahead. In Section [V] we work out several examples, with some surprising implications. 
The paper is concluded in Section VI with directions of future research. 



II. To Observe or Not to Observe Channel States at Encoder 

We begin by explaining the notation to be used throughout this paper. Let upper case, lower case, and calligraphic letter denote, 
respectively, random variables, specific or deterministic values which random variables may assume, and their alphabets. For two 
jointly distributed random variables, X and Y, let Px, Pxy and Px\y respectively denote the marginal of X, joint distribution 
of (X, Y) and conditional distribution of X given Y. X^ is a shorthand for n — rn + 1 tuple {X m , X m +i> ■ • • > X n —i,X n }. 
We impose the assumption of finiteness of cardinality on all alphabets, unless otherwise indicated. 

In this section, we consider the problem of optimal probing where encoder takes a 'costly' action depending upon message 
and use it to probe the channel and observe or not the channel state. The actions are binary, hence while action, A = 1 
corresponds to the case when encoder observes the channel state, action, A = implies no acquired state information. Note 
that such a kind of abstraction taps in the motivation considered in Compressed Sensing framework in ifTOl . where due to 
cost of sensing and measurement, you aim to observe only a few noisy signal observations and construct the original signal 
accurately. We further assume decoder knows the complete state information and that the encoder uses partial state information 
non-causally to generate channel input symbol. 



A. Problem Setup 

The setting is depicted in Figure [T] Message M is selected uniformly from a uniform distribution on the message set 
A4 = {1,2, ■ ■ ■ ,\A4\}. Nature generates states sequence S n € S n i.i.d ~ Pg, independent of message. A (2 nR ,n) code 
consists of : 

• Probing Logic : Ja '■ M — > A n G {0, 1}™ such that the action sequence A n satisfies the cost constraints 

n 

A(i") = -VA(ii)<r, (i) 

n — ' 

i=l 

where A(-) is the cost function while T is the cost constraint. Given nature generated state sequence S n and message 
dependent action sequence A n , encoder receives partial state information S™ € {{*}U<S}" through a deterministic channel 
characterized by, 

S e = h{S, A) = S if A = 1, (2) 
S e = h(S, A) = * if A = 0, (3) 

where * stands for erasure or no information of state symbol. Thus, A — 1 corresponds to an observation of the channel 
state while A = to a lack of an observation. Without loss of generality we can assume, T(0) = 0. 

• Encoding : f e : {M, S 1 ™) —> X n <= X n , i.e. encoder uses the partial state information non-causally to generate channel 
input symbols. 

. Decoding : f d : {Y n , S n ) -»Me{l,2,-, \M\}, where the channel output Y n G y n . 
The joint PMF on (M, A n , S n , S™, X n , Y n ,M) induced by a given scheme is 



tj i n n n n n ~ \ 

M,A n ,S n ,S£ ,X™ ,Y™ ,M\ m ' a ' S ^ei 1 tV i m ) 

1 n 

- TTTi ll^Sl s iJ 1 {se,i=ft(s* ) a i )} i V|Jf,s(2/iFi) S -iJ- ( 4 ) 

11 i— 1 

The probability of error is calculated as P e — P(M ^ M(Y n , S n )). The rate R is said to be achievable if there exists a 
sequence of (2 nR , n) codes for increasing block lengths satisfying the cost constraints jl| with i log \M.\ < R and P" ^ 
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Fig. 1. Encoder takes message dependent actions to observe state, encodes using available partial state information non-causally while decoder knows the 
complete channel state sequence. 



B. Probing Capacity 

Theorem 1: The cost constrained 'probing capacity' of the system in Fig. [T] with channel inputs constructed using the 
observed state sequence non-causally while decoder has complete information of the state is given by 

C(T) = max.[I(X;Y\S)], (5) 

where maximization is over all joint distributions of the form 

Pa,S,S c ,X,Y = PAPs1{Se=h(S,A)}Px\Se,APY\X,S, (6) 

for some Pa,Px\s c ,a such that £[A(A)] < V. 
Proof: 

Achievability : We use Rate-Splitting and Multiplexing to achieve capacity (for a similar scheme refer to JTT]). Note that in 
this problem while knowing S e we know A, hence we would show achievability with Px\s c ,A replaced by Px\s c - Without 
loss of generality we assume S = {1, 2, • • • , |6>|}, hence S e = {*, 1, 2, • • • , |6>|}. Fix Pa, Px\s^ which achieve C(-jx^). We 
split message M of rate R into two messages Mj and M 2 of rate R\ and R 2 respectively. 

• Generation of Codebooks : 

- Generate codebook C A of {^(mi)}^! n-tuples i.i.d. - P A . To send message M = (Mi,M 2 ), if A" (Mi) € 
T™ (A) (TJ 1 are typical in the sense of Q3), then action A n {M 1 ) is taken, else A™ = (0,0,---,0) is taken. If 
A™ (Mi) € T e n (A), then by typical average lemma [T3], constraints are satisfied as, 

n 

A(A") = - V A(Aj) < (1 + e)E(A(A)) = T. (7) 
n * — ' 

i=l 

- For every A" (mi), generate a codebook Cx(^i) of {(X"(mi, m 2 ), X"(mi, m 2 ), • • • ,IjJi(mi,m2))}ra,=i 

(|5| + l)n-tuples such that A"™, A™, • • • , A|™ , are i.i.d. — P X | Se=4: , P x \s e =i, • ■ ■ , Px\s e =\s\ respectively. Also 
generate a codebook C x of codewords {(Aq e (™2)}m 2 =i ~ ^-X|s c =*- 

• Encoding : 

- Given a message M = (Mi, Ms), encoder decides to take actions A™ (Mi) or Aq depending whether A" (Mi) is 
in Tf (A) or not. If A n (M x ) € T"(A) encoder finds = h{A n {M 1 ), S n ), and then sends X n (Mi,M 2 ) using the 
following multiplexing. 

Xi = X e}i (M u M 2 ) if S e>i = *, (8) 
X% = X Jii {M u M 3 )ifS 6ti =je{l,2,"-,\S\}. 

(9) 

If A™ (Mi) i T £ "(A), encoder sends Xfi(M 2 ). 

• Decoding : We perform Successive Decoding and Demultiplexing. By successive decoding we mean that actions are 
decoded first by decoder and then the actual codewords. 

- On obtaining the channel output sequence Y n and channel state sequence S n decoder finds the smallest value of Mi 
for which (A" (Mi), Y n , S n ) G T" (A, Y, S). If there is no such Mi, decoder assumes Mi = 1. 

- Once the decoder decodes the value of Mi, if A"(Mi) € T™(A), it knows = h(A n (M 1 ),S n ) and hence, 
using the codebook Cx(Mi), it demultiplexes {(X"(Mi, m 2 ), Xf(Mi, m 2 ), • • • , A^| (Mi, to 2 ))}^ 2 ^ 1 to construct 

X n (Mi,ni2)m2=i sequences as, 

A"j(Mi, m 2 ) - A eii (Mi,m 2 ) if S e>i = *, (10) 

X i (M 1 ,m 2 ) = X jt i(Mi,m 2 ) if S' ,<=jG{l, 2, ■■■,|5|}. (11) 
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- After demultiplexing, if A n (Mi) € T™(A), decoder finds the smallest value of M 2 for which 
(X n (Mi,M 2 ),Y n \S n ,A n (M 1 )) G T?(X,Y\S n ,A n (Mi). If there is no such M 2 , decoder assumes M 2 = 1. If 
A n (Mi) ^ T"(A), decoder finds the smallest value of M 2 for which (Xfi(M 2 ), Y n \S n ) € T™(X,Y\S n ), else 
M 2 = 1 is assumed. 

Analysis of Probability of Error : Without loss of generality we can assume M = (Mi,M 2 ) = (1, 1) was sent. We have 
the following error events, 

- 8 11 = {A n (l),Y n ,S n }tT?(A,Y,S). 

- £ 12 = {A n (m 1 ),Y n ,S n } G T?(A,Y,S) for m x 7^ 1. 

- £21 = {^"(1, 1), A"(l)} £ T?{X, Y\S n , A n {\)). 

- £22 = {X n (l,m 2 ),Y n \S n ,A n (l)} €T?{X,Y\S n ,A n (l)) for m 2 ^ 1. 
Let £ = P((A n (l),X"(l,l)) G T"(A,X)). Hence, 



P(£) = p(£ n £) + P(£ C n £) 
< P(£ n£) + P(£ c ). 



(12) 
(13) 



Note that by LLN (Q3)), P(££) -> as n -> 00. 

We will now show that P(£ H £) 0. Let £1 = £u U £12 and £ 2 = £21 U £22- By Law of Large Numbers, (LLN, (lfl3l). 
P(£ n £n) 0. By Packing Lemma (|13|), P(£ n £12) -> if Pi < I (A; Y, S) = I (A; Y\S) which implies by union 
bound P(£ n £1) < P(£ n £ n ) + P{£ n £12) ->• 0. 

Similarly by LLN, P(£ n £f n £21) and by Packing Lemma P(£ n £f n £22) if P 2 < I(X; Y\S, A) which 
implies by the union bound P(£ n £f n £ 2 ) < P(£ n £{ n £ 2 i) + P(£ H £f n £ 22 ) -> 0. Hence the total probability of 
error 



P(£ n £) = P(£ n (£1 u £ 2 )) < P(£ n £ x ) + p(£ n £f n £ 2 ) -> o, 

if Pi < /(A; Y\S) and P2 < /(X;y|5, A). Therefore we obtain for vanishing probability of error that 

P = Pi + P 2 

< I{A;Y\S)+I{X;Y\A,S) 

= I(X,A;Y\S) 

= H(Y\S)- H(Y\X,S,A) 

= I(X;Y\S)=C(^- e ). 

Proof of achievability is completed by taking e —> 0. 
Converse : Suppose rate P is achievable. Now consider a sequence of (2 nR , n) codes for which we have P" - 

nR = H{M) 

(a 



By Fano's Inequality (" ifMl l 



where e„ r izt^? q Now Consider 



P(M|5") 
= I(M; Y n \S n ) + H(M\Y n , S n ) 

H(M\Y n , S n )<l + P?R < ne n , 



I{M; Y n \S n ) = H(Y n \S n ) - H(Y n \M, S n ) 

n 

Y.HiYlS"^- 1 ) 



(14) 

(15) 
(16) 
(17) 
(18) 

(19) 

0. Consider 
(20) 

(21) 
(22) 

(23) 

(24) 



(b) 



(c) 



H(Y t \Y' l -\M, S n ,A n , S 1 ™, X r 



i=l 



i=l i=l 
n 



(25) 
(26) 

(27) 



i=l 
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< ^C(A(A ( )) (28) 

i=l 

(d) 



W 1 

< nC-VAA, (29) 
n * — ' 

i=l 

= nC(A(A")) (30) 

< nC(T), (31) 

where 

• (a) follows from the fact that message is independent of state sequence. 

. (b) follows from the fact that A n = A n (M), S e = h(S, A) and X n = X n (M, S"). 

• (c) follows from the fact that conditioning reduces entropy and from the markov chain, Y{ — (Xi,Si) — 
(Y 1 ^ 1 , M, S n \ l , A n ,S™, X n \ l ) which is due to the induced joint probability distribution as in Eq. 

• (d) follows from the fact that C(T) is concave in T. This is proved as follows. Let C(T\) and C(T2) be respectively 
achieved at joint P\Px\s A an ^ ^A^xis A- "-" et ^ >1 (') an d P 2 {') be the corresponding joint distributions. Since C(T) 
is nondecreasing in T, therefore we have 

E P1 [A(A)] = Ti (32) 
E P2 [A(A)] = T 2 . (33) 

Now consider a joint distribution P x = XP 1 + (1 — \)P 2 . Clearly 

E P >[A(A)]=AT 1 + (1-A)r a . (34) 

Now observe that I(X:Y\S) is concave in P(Y\S) which is linear in PaPx\s c ,a- Hence I(X;Y\S) is concave in 
PaPx\s c .a- Thus denoting R x as the value of I(X;Y\S) at joint P x , we have 

AC(ro + (i - A)C(r 2 ) <r x < c(ATi + (i - A)r 2 ). 

• (e) follows from the fact that C(T) is non decreasing in T, which can be argued easily as larger T implies a larger feasible 
region and hence larger capacity. 

We further note the following relations and Markov Chains : 

• Ai — Ai(M) is independent of Si as state sequence is independent of message and actions are functions of message. 
. Xi — (S e j, Ai) — Si. Refer to Appendix [B] for Proof. 

• Yi — (Xi, Si) — (Ai, S Gy i) follows from the DMC assumption on the channel which implies the induced joint probability 
distribution as in Eq. (HI). 



Hence by using Equations ( |22| i, (23 1 and (31 1, and letting n — >• oo we have R < C(T). ■ 
Note 1 (Causal Probing): Note that the capacity is the same if we now consider the setting where the encoder generates 

channel input sequences using observed state causally. It is easier to see that converse holds without change as in non-causal 

setting. Achievability remains same because we are multiplexing based only on current observed partial state information. 
Note 2 (Probing Independent of Messages): If action sequence is taken independent of message, time sharing is optimal. 

This is because when action sequence is independent of message, the setting is equivalent to the case when decoder knows 

the action. The capacity in this case is, 

C(T) = m&x[I(X;Y\S,A)} (35) 
= max[p(A = 0)1 (X; Y\S,A = 0) 

+ P (A=1)I(X;Y\S,A = 1)} (36) 
= p(A = 0)C(0)+p(A = l)C(l). (37) 



III. Equivalence between Encoder Probing and Channels with Action-Dependent States 

In the previous section we motivated the basic problem of characterizing the capacity when observation of the channel state 
at the encoder comes at a price. We had further assumed that the decoder knew the complete state information. In this section, 
we point out the equivalence of general setting of action dependent channel probing at the encoder with the setting of channels 
with action dependent states considered in |8]. In our generalized setting, actions are taken in an alphabet A and encoder 
observes S e through a DMC Ps £ \s.a- The setting in [8| and [9] is as follows. Given a message AI, encoder takes actions 
A n = A n (M), which affect the formation of channel states. These states are then used by the encoder causally or non-causally 
to generate channel input. 

First consider the case when decoder does not know the channel states. Now in our setting we are given from nature 
Pg, Ps e \s,A> Py\X,S> but this is equivalent to Ps c \a, Py\x.s s ,a since S n is not available at encoder or decoder and hence 
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TABLE I 

Equivalence of setting in (8) to our formulation of optimal probing at encoder. 
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Fig. 2. Equivalence of our setting of probing the channel state at the encoder to that of channels with action dependent states in |8 |. 



can be averaged out. This establishes the equivalence as depicted in Table I and Fig. [2] If the decoder now knows the channels 
state S d through DMC Ps d \s,S e ,A we can replace Y in Fig. |2] with (Y,Sd) to compute capacity. 

Hence using the proven equivalence we invoke and list theorems from [8| transformed for our setting. 

Theorem 2 (Equivalent to Theorem 1 in l^j.): The 'probing capacity' for optimal channel state observation at the encoder 
which generates channel inputs using partial state information non-causally as in Fig. [2] with cost constraint T, is given by, 

C nc (T) = max[I{U;Y,S d )-I{U;S e \A)] (38) 
= m< A x[I(A,U;Y,S d )-I(U;S e \A)}, (39) 

where maximization is over all joint distributions of the form 

PA,S,S c ,U,S d ,X.Y = PAPsPs B \S,APu\S*,APs d \S,Se,A 

X l{X=f(U,S e )}PY\X,S, (40) 

for some P A , Pu\s e ,A, I such that E[A(A)] < V and \U\ < \A\ \S\ \S e \ \S d \ \X\ + 3. 

Theorem 3 (Equivalent to Theorem 2 in ^j.): The 'probing capacity' for optimal channel state observation at the encoder 
which generates channel inputs using partial state information causally as in Fig. [2] with cost constraint T is given by, 

C c (T)=ma^[I(U;Y,S d )}, (41) 

where maximization is over all joint distributions of the form 

Pu,A,S,Se,Sd,X,Y = Pul{A=g(U)}PsPSe\S : APs d \S,S e ,A 

x l{x=f(u,s e )}Py\x,S! (42) 

for some P v ,g,f such that E[A(A)] < T and \U\ < mm{\y\ \S d \ , |^| \S\ \S e \ \S d \ \X\+ 3} 

Note 3: Note that auxiliary variable U has an increased cardinality as compared to equivalent setting in (8). This stems 
from the following, 

• Output Y is replaced with (Y,S d ), hence in causal setting we have \U\ < \y\ \S d \ following the arguments in (5). 

• To preserve PA.s.s s ,S d .x, m both causal and non-causal setting we have \U\ < \A\ \S\ \S e \ \S d \ \X\ — 1. In causal setting, 
four more elements are needed, one to preserve H(Y, S d \U), one to preserve independence of S with (A,U) and two 
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(a) 



more each to preserve markov chains (S e , Sd) — (S, A) — U and X — (U, S e ) — (A, S, Sd)- In non causal setting, four 
more elements are needed, one to preserve H(S e \A, U) — H(Y, Sd\U), one to preserve independence of S with A and 
two more to preserve markov chains, U — (S e ,A) — (S, Sd) and X — (U, S e ) — (A, S, Sd)- 

Deriving Theorem [7] using Theorems [2] and [3] We would like to derive the capacity results in Theorem [T] from Theorems [2] 
and [3] We have already pointed out that capacity of the setting in Fig. [T] is the same whether encoder encodes using partial 
information causally or non-causally (call it C(T) = C C (T) = C nc (T)). (Subscripts 'c'and 'nc' stand for capacity for causal 
and non-causal encoding of partial state information). We claim to prove the result C(T) = C C (T) = C nc (T) using Theorems 
|2]and|3] 

For non-causal encoding (using Theorem [2]) 

C nc (T) = max[I(A,U;Y,S)-I(U;S e \A)] (43) 
= max[/(A U; Y\S) + I (A, U; S) 

-I(U;S e \A)] (44) 

max[H(Y\S) - H{Y\S, A, U, S e , X) 

+I(U;S\A)-I(U;S e \A)] (45) 
max[H(Y\S)-H(Y\S,A,U,S e ,X) 

-H(U\S,A,S e ) + H(U\S e ,A)} (46) 

max[H{Y\S) - H{Y\S, X) - H(U\A, S e ) 
+H(U\S ei A)} (47) 
- I(X;Y\S), (48) 

where 

• (a) follows from the fact that S e — h(S, A) and X — f(U, S e ) and that A is independent of S. 

• (b) follows from the DMC (Py\x,s) assumption and that U — (S e , A) — S is a Markov Chain. 

This maximization is over joint distribution 

Pa,s,s c ,u,x,y 

= PAPsPs e \S,APu\Se,A 

x l{ X =f{u,s e )}PY\x,s (49) 
P A (a)P s (s)P s<ils ^ A (s e \s,a) 
Pu\s e (u\s ll )l{ x =f( Utl , e )yPY\x,s{v\ x i s ) 



(b) 



(c) 



(50) 



PaPsPsc\s,aPx\s c Py\x,Si 



(51) 

where (c) follows from the fact that knowing S e implies knowing A. Hence we have from Equations (48 1 and (51 1. C nc (T) = 
Now for causal encoding (using Theorem [3j 

C C (T) = max[/(C/;y,5)] (52) 

( =' max[I(U;Y\S)} (53) 

( =' m&x[I(A,U;Y\S)} (54) 

= max[H(Y\S)-H(Y\S,A,U,S e ,X)] (55) 

= max[H(Y\S)-H(Y\S,X)] (56) 

= I(X;Y\S), (57) 

where (d) follows from the fact that U and S are independent and (e) follows from the relation A = g(U). This maximization 
is over joint distribution 



Pu,A,S,S e ,X,Y = Pu'i-{A=g(U)}PsPs s \S,A 
X l{X=f(U,S e )}PY\X>S- 



(58) 
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We will now show that joint distribution of the form in Theorem [T] is contained in ( |5"8| >. So the joint distribution in Theorem [T] 

Pa,s,s^x,y = PAPsPs e \s,A p x\s c ,APY\x,s (59) 
PsPs^\s,aPqPa^-{x=f(s c ,a,q)}Py\x,s 



(/) 



(60) 



in) 



Pul{A=g(U)}PsPSe\S,A 



x i{x=F(u,s e )}Pv\x,s, (61) 
where (f) follows from the Functional Representation Lemma ([ 13 1), Q is independent of S e ,A and (g) follows from defining 



U = (A,Q). Hence by Equations ((57) and ((61) we have shown that C C (T) > C(T). But C C (T) < C nc (T) = C(T). This 
completes the claim. 



IV. Optimal Probing at Both Encoder and Decoder 

In earlier sections we considered the framework where only encoder was allowed to take actions. In this section we further 
generalize the setting where decoder can also take actions based on the channel output and then obtain its own partial state 
information which is used to construct estimate of the transmitted message. We motivate this general setting in the framework 
of communication over slow fading Channels. 

A e JM) . , 



ENCODER 
PROBE 



M € {1 : 2 nH } 



CHANNEL 
ENCODER 



P, 



S e A\S,Ae,Ad 



Si 



DECODER 
PROBE 



Ps 



Si 



Xi{M,Sl) 



Y\X,S 



Y 



Sdj 



CHANNEL 
DECODER 



DMC 



M(Y n ,S%) 



Fig. 3. Encoder and decoder both take actions to observe partial state information and use it for encoding and decoding. 



Consider a point to point communication system where in each time epoch channel state is i.i.d. ~ Ps(si), s i € S. In the 
next epoch the information of this present state is lost, hence encoder and decoder have to exploit whatever information is 
available to them causally to get the best achievable rate. More precisely consider the setup as depicted in Fig. [3]: Message M 
is selected uniformly from a uniform distribution on the message set M. = {1, 2, • • • , Nature generates states sequence 

S n € S n i.i.d ~ Ps, independent of message. A (2 nR ,n) code consists of : 

• Probing Logic : 

- Encoder Probing Logic fj± e i : M — > A e j G A e 

- Decoder Probing Logic j A d f ■ Y 1 ^ 1 — > A^i 6 Ad, where channel output Y E y. 
Further the encoder and decoder actions are cost constrained, 

n 

A(A^,A2) = -y2 A (Ae,i,A dti )<T, (62) 
i=l 

where A(-,-) is the cost function while T is the cost constraint. Given nature generated state sequence S n , message 
dependent encoder action sequence A™ and channel output dependent decoder action sequence A^, encoder acquires 
partial state information 5" g 5" (which we will call CSIT, i.e. Channel State Information at Transmitter) and decoder 
S^ S S 7 J (which we will call CSIR, i.e. Channel State Information at Receiver), through a DMC Ps c .s d \s.A c .A d - 
. Encoding : f e<i : (M, SI) 



9 



. Decoding : f d : (Y n , S%) -> M e {1, 2, • • • , |A(|}. 
The joint PMF on (M, A™, AJJ, S n , 5™, F™, M) induced by a given scheme is 

r> /nnnnnn~\ 
M,A n ,S™ ,S™ ,S% ,X™ ,Y™ ,M\ m > a ' S T s ei s dt x iV i m ) 

1 ™ 

= Tm| n i { a d,i=/A diI to l - 1 } 1 {a=,i=/A eii (m)} p s(s 4 )Ps' c ,s ti |S',A e ,A ti (s e!l ,s c i :i |s,a e ^,a d!l ) (63) 

■i— 1 

71 

x n i {xi=fe,i(™,si)} P Y\X,s(yi\Xi,S i ) X l{rh=/ d (if ,««)}• (64) 

1=1 

7 J Probing Capacity: 

Theorem 4: The cost constrained 'probing capacity' for the scenario depicted in Fig. [3] is given by 

C(T) = m&x[I(U;Y,S d \A d )], (65) 
where maximization is over all joint distributions of the form 

Ps,A d ,u,A e ,S B ,x,Y,Sd( s i a d, u, a e , s e , x, y, s d ) 

= Ps(s)PA d (ad)Pu\A d ( U \ a d) 1 {a <1 =g(u,a d )}Ps e ,S d \S,Ae,Ad( S e> S d I a e , 0<j) l{ x =/(u,s e ,a d )} iV |X,S , (66) 

for some P A(J , P^, <?, / such that E[A(A e , A d )] < V and |W| < rmn{\y\ \S d \ \A d \ , \S\ \A d \ \A e \ \S e \ \S d \ \X\ +4} 
Proq/: 

Achievability : Fix PA d , Pu\A d > 9: f which achieve C(jt^). Encoder and decoder decide on a sequence A d , i.i.d ~ PA d - By 
similar arguments as in achievability of previous theorems using typical average lemma, constraints are satisfied. Now using 
Theorem|2]if A d ^ — a Vz, error free communication is achieved if R < I(U ; Y, S d \A d = a). Hence since encoder and decoder 
both know A% we achieve R < I(U; Y, S d \A d ). 

Converse : Suppose rate R is achievable. Now consider a sequence of (2 nR ,n) codes for which we have P™ "Z_>°. Consider 

nR = H(M) (67) 

= I(M:Y n ,S d l )+H(M\Y n ,S d l ). (68) 

By Fano's Inequality ( |[T4l ) 

H(M\Y n , S2)<1 + P r e l R < ne n , (69) 

where e n '_!_}? g N w Consider 

I(M;Y n ,S2) = H{Y n ,S2) ~H{Y n ,S^\M) (70) 

n n 

( => J2 H ( Y *> S d ^-\S d -\A d ) - ]T H(Y h S^Y*- 1 , St 1 , M, A d , A") (71) 



i=i 



< ^^(F^S^IA^)-^^,^!^- 1 ^^ 1 ,^- 1 ^,^,,^) (72) 

i=l 



< 5^fl-(y<,5 <l)i |Aj,i)-X] fl '( y <' 5 «».<l l7 *'^.0 (73) 

i=l i=l 
n 

= J2 T ( U i' Y i> S d,i\ A d,i) (74) 

i=l 
n 

< 5^C(E[A(i4 eii ,Aj,i)]) (75) 

i=l 

< nC(f7[A(A» A3)]) (76) 
(d) 

< nC(T), (77) 

. (a) follows from the fact that A d , t = A^Y^ 1 ) and A™ = A"(M). 
. (b) follows by defining = (M, F *~ 1 , S^ - 1 , 1 , Ajj~ 1 , A" ) . 

• (c) follows from the fact that C(T) is concave in T. This is proved in Appendix [X] 

• (d) follows from the fact that C(T) is non decreasing in T, which can be argued easily as larger V implies a larger feasible 
region and hence larger capacity. 

We note the following relations, 
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• Ad,i — Ad,i{Y % 1 ) is independent of Si, it follows from proof of markov chain MCI in Appendix |b| 

• We have the Markov Chains, 

— Ui — Ad,i — Si. 

— A e ^ — (Ui, Ad,i) — Si. 

— \Se.,ii Sd,i) ~ (Si, A e i, Ad,i) — Ui- 

— Xi ~ (Ui, S e ^i, Ad.i) — (A e< i, Si, Sd,i). 

~ Yi (Xi, Si) (Ui, Ad.i, A e i, S e .i, Sd,i). 
These are proved in Appendix [C] 

• As Ui contains A™, maximization is unaffected if we replace PA c \u.A d with l{A c =g(u.A d )}- Since I(U; Y, Sd\Ad) is convex 
in PY,s d \u,A d , this implies convexity in Px\u,S c ,A d - hence again maximum would be unaffected if general Px\u,S e ,A d is 
replaced with X = f(U, S e , A d ). 

• Cardinality Bounds on U That set U needs no more than \y \ \S d \ \Ad\ follows from arguments in |15|. Also U needs 
|5| |<S e | \Ae\ \Ad\ \S d \ \X\ - 1 to preserve Ps,A e ,A d ,S e ,S d ,x (which preserves H(Y d , S d \A d )), one element to preserve 
H(Y, Sd\Ad,U), one element to preserve independence of S and A d and three more to preserve the markov chains, 
(U, A e ) -A d -S, (S e ,S d ) - (S, A e , A d ) -UsmdX- (U, S e , A d ) - (S, S d , A e ). 

The proof is then completed by using Eq. ( |68j ), (|69]) and tfT7\ . ■ 
Note 4: We can consider a more general setting where encoder and decoder feedback logic depend upon the respective past 
state observations, i.e., encoder takes actions, A e ,i(M, S 1 ^ 1 ), while decoder takes actions, Ad^(Y l ~ x , S^ -1 ). While the achiev- 
ability remains unchanged as in Theorem^ it is easy to see the converse also hold with Ui = (M, Y l ~ Y , S 1 ^ 1 , S^T 1 , A 1 ^ 1 , A\). 

Note 5 ( Computer Memory with Defects : Non-causal Probing at both Encoder and Decoder): : Consider a computer 
memory with defects, as in what the encoder writes, X and what the decoder reads, Y are related to each other through a 
discrete memoryless channel, Py\x,S> where state S models defects. If there are no cost constraints to acquire the information 
about defects, encoder and decoder are better-off by coding and decoding using this entire state sequence S n as it is available 
before writing and reading on the memory. Note that we assume neither the writing nor the reading operation changes the 
state. However when acquisition of this state information by the encoder as well as the decoder is cost constrained, encoder can 
take actions, A e ,-(M) to get partial state information 5™ and then write X,-(M, S 1 ™) while decoder can wait for entire memory 
to be written and then take actions, A dt i(Y n ). It will then obtain its side information S^- Hence the setup remains similar 
as depicted in Fig. [3] the only difference from the setup in Section IV is that encoder now uses the partial state information, 



CSIT, non-causally to generate input symbols, i.e. f e : (M, S*™) — > Xi G X, while decoder takes action based on entire channel 
output sequence, i.e., fA d ■ Y n — > Ad.i € A d . Also in order to avoid issues of instantaneous dependency, we must have, 

Ps e ,S d \S,A e ,A d = P S e \S.A e X Ps d \S,S e ,A e ,A d (78) 



Equivalence to Relay Problem 

The above problem is in general a hard one. Consider a special case where A e is binary, with cost function A(A e , Ad) = A(A e ). 
For this case, the zero cost and unit cost corner cases are themselves open with only bounds. When cost is unity, this is the 
case of relay channel with states and infinite lookahead with states known non causally to the encoder. For the standard relay 
channel (no infinite lookahead) with states known to encoder, Zaidi and Vanderdorpe in [ 16] lower bound the capacity. For 
zero cost the system is a special case of 'Relay Channel with Infinite Lookahead'. We conclude by showing the equivalence 
of this problem at zero cost to that of Relay with Infinite Lookahead, as depicted in in Fig. [5] and Table II. 



CHANNEL STATE 


Si 


, GENERATOR 






Si 


A 



*~ Ps d \S,Aj 



A d: ,(Y") 



M € {1 : 2" R } 


CHANNEL ] 


X t (M) 


... ' 






ENCODER J 




-MDMC J— 





Fig. 4. Decoder takes actions dependent upon the entire observed channel output sequence and uses the actions to aquire partial channel state information. 
Encoder has no knowledge of channel states. 



V. Numerical Examples 

A. Discrete Channels 

1) [Non-causal Probing] : To Observe or Not to Observe Channel State at Encoder, Decoder observes complete channel 
state. : 

Example 1 (Binary States, S(a) channel and Z(j3)): Consider the communication system shown in Fig.|6]with binary input 
and output. Decoder knows the state completely. Actions are binary which correspond to observe or not to observe state at 
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Y, = (Y„S d _,) 



RELAY 
ENCODER 



CHANNEL 
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M e {1 : 2" n } 

Fig. 5. Equivalence of setting in Fig. [4] with Relay with Infinite Lookahead 



p. - p. . - . 

r Y l \X r Y\X,X l ,Yi 



Yi 


CHANNEL 




DECODER 



TABLE II 

Equivalence of setting in Fig. [4] with Relay with Infinite Lookahead fT3l . 



Relay with Infinite Lookahead 

(CCD) 


Decoder Probing in Fig. |4J 


X 


X 


X! 


A 


Y X 


Y 


Y 


(Y, S d ) 



M(Y\S n d ) 



M(Y n 



P(X = 


o\s e 


= *) 


= Pu 


P(X = 


o\s e 


= 0) 


= P2, 


P(X = 


o\s e 


= 1) 


= P3 



encoder. Also the cost function, A(a) = a, for actions, a £ {0, 1}. We compute the capacity using Theorem [l] S e G {*,0,1} 
and a — j3 = e = 0.5. We assume the following 

(79) 
(80) 
(81) 

As C(T) is non decreasing in T. P(A = 1) = T. We obtain for T e [0, 1], 

c(r) 

= max [eh 2 (a((l - T)pi + Tp 2 )) 

Pl>P2,P36[0,l] 

- e((l-T) Pl +T P2 )h 2 (a) 

+ (1 - e)h 2 (/3((1 - r)(l - Pl ) + T(l - P3 ))) 




M 6 {1 : 2""} 



Fig. 6. Example 1 
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- (i - e)((i - r)(i - pO + r(i - p 3 ))h2(P)}. 

We compute the above expression numerically (Fig. [7]). 



(82) 
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Fig. 7. Cost-capacity trade off for Example 1. Time sharing is strictly sub-optimal. 

Note 6 ( Cut-off point sa 0.2 in Fig. U\: An observation from this example which is really surprising is that in order to achieve 
the maximum capacity (which is at T = 1) one needs to only observe a fraction of states ss 0.2. This threshold however can 
also be theoretically derived. Essentially we find out the range of T £ [0, 1] for which the capacity achieving joint distribution 
in C(r) induces exactly the same marginals, Px\s as when the cost is unity. Let pi, p* 2 and p\ be optimal distributions for 
cost T as in Eq. 82 The marginals are equal to 



P(X 
P(X -- 



= 0|S* = 0) 
015 = 1) = 



= (1 - T)pt + Tp* 2 

(i-r>* + i>*. 



(83) 
(84) 



For r = 1, we can easily compute P(X = 0\S = 0) = 0.4 and P(X = 0\S = 1) = 0.6. Therefore for marginals to be same, 



(i - 1>; 
(i - r)rf - 



-Tp* 2 

I>3 = 



= 0.4 
0.6, 



(85) 
(86) 



or 

r(Pa-P2)= 0.2. (87) 

Since p* 2l p* 3 £ [0, 1], it is easy to see that if the cost T ~ 0.2, we can find (pt,P2,Pt) such that C(T) = C(l). At F = 0.2, 
optimal scheme is X = S e © 1 if S e ^ *, and Bern(0.5) otherwise. 

2) [Causal Probing] : To Observe or Not to Observe Channel State at Encoder , with no channel state at the Decoder: 
Example 2 (Binary States, S(a) channel and BSC (5)): Consider the communication system shown in Fig. [8] with binary 
input and output with e = 0.5, a = 0.1 and 6 = 0.3. Here states are not known to the decoder and encoder uses partial state 
information causally to generate channel input symbols. Actions are binary with cost, A(a) = a. A = 1 corresponds to an 
observation of the channel state while A = to a lack of an observation. The evaluation of capacity expression involves an 
auxiliary random variable. We compute its lower bound on capacity numerically using Theorem as shown in Fig. [9] Here also 
clearly time sharing is not optimal. 

Note 7: Note the interesting phenomenon in this example too (as in Example 1), where we just need to observe roughly a 
fraction of state ~ 0.5 to obtain the capacity at unit cost. This can be reasoned in a similar manner as reasoned for Example 
1. 

Example 3 (Binary States, Multiplier Channel with Power Constraints.): Consider a multiplier channel with binary inputs, 
outputs and states, Y = S -X where S ~ Bern(0.5). Again note that actions are binary with A(a) = a and A = 1 corresponds 
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M 6 {1 : 2""} 



Fig. 8. Example 2 




Fig. 9. Cost-capacity trade off for Example 2. The dotted straight line is obtained by time sharing between zero cost and unit cost capacity (Scheme 1). 
Time sharing between a scheme for which A = g(U) = U in Theorem [2] (call it Scheme 2) and Scheme 1 gives a lower bound on the capacity indicated by 
solid line. It is evident that naive Scheme 1 (time sharing scheme between extreme capacities at zero and unit cost) is strictly sub-optimal. 



to an observation of the channel state while A = to a lack of an observation. Let 

(88) 
(89) 
(90) 

We see that capacity under the power constraint, 

P (z = i)<p e[o,i], (91) 



V* 


= P{x = 


l\s e 


= *) 


Pa 


= P{x = 




= 0) 


Pi 


= P(x = 


l\8 e 


= 1) 



IS 



C(T,P ) = max h 2 [(1 - I>* + T Pl ] (92) 
subject to 

(i-r>* + J -(po + Pi) = Po (93) 
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For Pq = 0.25, we have 




r 



Fig. 10. Cost-capacity trade off for Example 3 for Pq = 0.25. The dotted straight line is obtained by time sharing between zero cost and unit cost capacity. 



B. Continuous Channels 



1) 'Learning' to Write on a Dirty Paper : : Using standard arguments, it can be shown that the capacity results carry over to 
the case of continuous channels with power constraints on input symbols. Let us recall the setting in Dirty Paper Coding. Costa 
in ifTTl considered the communication system as in Fig. 11 The output of the channel is given as Y n — X(M, S n ) + S n +Z n , 



ENCODER 



S" ~tf(0,QI) 



X n (M,S n ) 



Z" ~ A/"(0, NI) 



•Q- 



— A 


DECODER 


M(Y") 







Fig. 11. Dirty Paper Coding as in 1171 



where 

. Channel state or Interference S n is i.i.d. S n — W(0, QI) independent of i.i.d. noise, Z n — 7V(0, NI). 

• Channel state or interference is known to the encoder non-causally. Encoder hence generates channel inputs X n (M, S n ) 
which are cost constrained, i.e., - V™ , X 2 < P. 

• Decoder has no knowledge of channel state or interference. 

It was shown that the capacity of this channel is C(P/N) = | log 2 (l + P/N) which is equal to the capacity of a standard 
gaussian channel with signal to noise ratio P/N. This is strictly larger than the capacity when S n is unknown to both encoder 
and decoder, i.e., \ log 2 (l + P/(Q + N)). 

We now consider the setting as in Fig. [12] While in Writing on Dirty Paper, it was assumed that interference or channel state 
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DECODER 









Fig. 12. Learning to write on a Dirty Paper. 



was completely available, but this might not be true in real systems one might have to pay a price to acquire this information. 
Hence in contrast to writing on a paper where intensity and positions of all dirt spots are known, we have to take action to 
learn where the paper is most dirty, hence the name Learning to Write on a Dirty Paper. Actions are binary, with cost function, 
A (a) = a. Here also A = 1 corresponds to an observation of the channel state while A = to a lack of an observation. Also, 

S e = h(S, A) = * if A = (96) 
S e = h(S,A) = SifA=l, (97) 

where * stands for erasure or no information. 
Invoking Theorem [3] we have the capacity, 

C(r,P) = max[I{U;Y) - I(U; S e \A)} = max[I(A, U;Y) - I(U; S e \A)}, (98) 

where maximization is over joint distribution, 

fA,U,S,S e ,X,Y = -P-A/sl{S c =^(S, J 4)}l{X=/(t/,S e )}/y|X,S (99) 

such that, p(A = 1) < T and E[A 2 ] < P. We give a lower bound on this capacity by considering a simple power splitting 
achievable scheme. Let us assume X\(A = 0) — A/"(0, Pi) and X\(A = 1) ~ Af(0, P 2 )- Clearly C(T, P) is maximized when 
p(A = 1) = r. Therefore we have from power constraints, 

(i-r)Pi+rp 2 <p. (ioo) 

Further we assume, given action A, channel input X is independent of U, S, Z. Let 

U\(A = 0) = X\(A = 0) (101) 
U\(A=1) - X\(A=l) + a{P 2 )S, (102) 

where a(P 2 ) = P 2 /(P 2 + 1). Since Y = X + S + Z, we have, 

Y\A = 0~ 9o = Af(0, Pi + Q + N) (103) 

Y\A=l~ gi = N{0,P 2 + Q + N) (104) 

Y~ g = (l~T)Af(Q,P 1 + Q + N)+TAf(0,P 2 + Q + N). (105) 
Considering this distribution gives the following lower bound on capacity, 

C lower = msx[I(A,U;Y)-I{U;S e \A)} (106) 
Pi 1P2 

= mnx[I{A;Y) + I(U;Y\A) - I(U;S e \A)} (107) 



(a) 
(b) 



maxf/ife) - (1 - T)h(g ) - Th( gi ) + (1 - T)I(X; Y\A = 0) + r(7(C7; Y\A = 1) - I(U; S\A = 1))] 

Pi ,Pl 

max[%) - (1 - T)h(g ) - Th( gi ) + (1 - T)C{P X /{Q + N)) + TC{P 2 /N% (108) 
where 

• (a) follows from the fact that S e is just erasure for A = 0, while for A — 1 is equal to S. h(g) denotes the differential 
entropy of a continuous random variable with distribution g. 

• (b) follows from the fact that when A = 0, 

I(X;Y\A = 0) = h(Y\A = 0) - h(Y\X, A = 0) (109) 
= h(Af(0, P x + Q + N)) - h(Af{0, Q + N) (110) 

= i log 2 (l + P/(Q + N)) =C(P/(Q + A)), (111) 
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while for A = 1 following the similar steps as in fT71 [Eq. 3,4,5,6,7] we obtain, 

I{U ; Y\A = 1) - I(U ; S\A = 1) = i log 2 (l + P/N) = C(P/N). (112) 

Fig. [13] shows the plot of Ci ower with r for P = Q = N = 1, which indeed performs better than naive time sharing 
between C(P/N) and C(P/(Q + N)). 
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Fig. 13. Power Splitting lower bound on capacity for Learning to Write on Dirty Paper in Fig. |12| 



2) Fading Channels with Power Control: We revisit the setting of fading channels with encoder and decoder state information 
as in ITTj . but now the encoder takes actions to acquire the channel state from receiver state estimation, while decoder already 



knows the channel state. This is depicted in Fig. 14 Here g[i] denotes the i.i.d. channel states which take value in a finite state, 
S = {.91,(72} with equal probability. n[i] is i.i.d. gaussian noise ~ J\f(0, N/2). Bandwidth for communication is B. 71 = 
and 72 = ^ signal to noise ratios, such that 71 < 1 . Actions are binary which correspond to observe or not to 
observe state at encoder with cost functions A(a) = a and cost constraint T. f is defined as in Theorem [T] From results in 
ifTTl . we know that, 



Capacity when only decoder knows the state information 



C(0) = I log 2 (l + 71) + § log a (l + 72). 



(113) 



Capacity when encoder also knows the channel state (possibly through a noiseless feedback from decoder) in addition to 
decoder, 



C(l) 



B 



log 2 (l + 272 



(114) 



The above capacities form the extreme cases of zero and unit cost respectively for the communication system in Fig. 14 Using 
Theorem [T] we have the capacity for the communication system in Fig. [14] with bandwidth B as 



C= max 2BI(X;Y\S) 



max 2B[h(Y\S) - h(Af(0, NB))}, 

PA,fx\s e 



(115) 



such that E[r(j4)] < V and E[A 2 ] < P. Clearly maximum is attained for p(A = 1) = T. To obtain a lower bound we assume 
the following, 



X\(S e = *) ~ Af(0,P,) 
X\(S e = gi ) ~ Af(0,Pi) 
X\(S e =g 2 ) ~ Af(0,P 2 ). 



(116) 
(117) 
(118) 
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This implies, 



Y\(S = gi ) ~ (l-r)Af(0,NB + P, gi ) + rM(0,NB + P m ) 
Y\(S = 9l ) ~ (l-r)N(0,NB + P*g 2 ) + rN(0,NB + P 2 g 2 ), 



with power constraints, 



E[x 2 } = (i - r)p, + -{Pi + p 2 ) < p 



Hence a lower bound on capacity is, 



Ciower(r, P) = 2B p max 3 



h(f Y \s= gi ) + M/^|s= ff2 ) 



- h(Af(0,NB)) 



subject to (1 - r)P* + -(Pi + P 2 ) < P 



We plot C /ou , er (r, P) as a function of T for P = AT = 1, and <?i = 0.01, 52 = 1 in Fig. 15 
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Fig. 14. Fading channels with encoder taking actions to acquire channel state for adaptive power control. 
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Fig. 15. Lower bound on fading channel communication system in Fig. |14| Time sharing is evidently highly sub-optimal. 
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VI. Conclusion 

In this work, we obtain 'Probing Capacity' of systems which are characterized as follows : 

• Channel is DMC with i.i.d states. 

• Encoder takes costly actions and probes the channel for channel state information. This may be used causally or non- 
causally to generate channel input symbols. 

• Decoder takes costly actions and probes the channel to obtain state information which is then used to construct message 
estimate. 

We also worked out examples of discrete and continuous channels in cases where only encoder probed the channel for states. 
We not only showed that a naive time sharing scheme is strictly sub-optimal but also showed a pleasing phenomenon (see 
Example 1. in Section [V} where one needs to observe only a fraction of states to obtain maximum rate of transmission i.e. 
rate when cost of state observation at encoder is not constrained. 

As directions of future work, following are important questions/conjectures worth spending time and energy, 

1. What if encoder actions depend on past sampled state, i.e., A e ,j = A e _i(M, SI" 1 ) for the case when partial state information 
is to be used non-causally ? Can capacity be increased ? 

2. What about probing capacity for channels with memory ? 

3. Does the Example 4 on 'Learning to write on a dirty paper' also support the pleasing phenomenon when we can observe 
only a fraction of states and still achieve Costa's dirty paper coding capacity, C(P/N) ? 

4. What if we take action to sample or not feedback at encoder or decoder for channels with memory ? 

Some of the results concerning sampling or not the feedback for finite state channels (FSC) have been characterized in ifTHl . 
while the rest are under investigation. 
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Appendix A 
Concavity of Capacity in Cost 

We prove the concavity of cost constrained capacity in Theorem |4] by concavification argument. Consider 'concavification' 
of capacity in Theorem [4] as 

C Q (T) =mta[I{U]Y,S d \A d ,Q)], (123) 
where maximization is over all joint distributions of the form 

PQ,s,A d ,u,A e ,s e ,x,Y,s d (s, a d , u, a e , s e , x, y, s d ) 
= PQ{q)Ps(s)PA d \Q{a d \q)Pu\A d ,Q{u\a d , q)l{a c =g(u,a d ,q)} 

PSe\S,AA S e\ S > a e-) 1 {x=f(u,s s ,a d ,q)}PY\X,sPs d \S,A d (sd\s,a d ), (124) 
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for some Pq, P Ad \Q, Pu\A d \Q, 9, f such that E[A{A e ,A d )} < T. Clearly C Q (T) > C(T). Left is to prove C Q (T) < C(T). 

I(U;Y,S d \A d ,Q) = H{Y,S d \A d ,Q) - H{Y, S d \U, A d ,Q) (125) 
< H(Y,S d \A d )-H(Y,S d \U,A d ,Q) (126) 
= I{U';Y,S d \A d ), (127) 

where last inequality follows from the defining U' — (U,Q). Proof is completed by noting that the joint distribution of 

(S, A d , U', A e , S e , X, Y, S d ) is same as that of (S, A d , U, A e , S e , X, Y, S d ). 



Appendix B 
Proof of Markov Chain Xi - (S e ,i,AA - S t 

Since Xi — X,(M, S"), it suffices to prove (M, — (S e ,i, Ai) — Si. We observe the joint distribution can be factorized 

as, 

n 

P(M,A n ,S n ,S:) = P(M)J]P(5 i )P(A i |M)P(5 e , i | 1 9 i ,A i ) (128) 

= $ 1 (A"\ i ,M,^\ < ,S"\ i ,A i )$2(5 i) 5 e , i ,A i ) (129) 
= ^(A^MtSZtS^^AitSejQafaSe^Ai) (130) 

which implies the Markov Chain (A n \\M, S%, S n \ l ) - (S eti , AA - S u which in turn implies (M, S£) - (S e ^,A t ) ~ Si. 

Appendix C 
Proof of Markov Chains in Theorem @] 

We will prove the following markov chains, 

MCI Ui-A d ,i-Si. 

MC2 A e>i -{Ui,A d ,i)- Si. 

MC3 {S e>i , S d>i ) - (S h A eti ,A dti ) - Ui. 

MC4 X t - (Ui,S e ,i,A dA ) - {A e>i ,S u S dt A. 

MC5 Yi - [X it Si) - (U l: A d , u A e ^S e ^S d ^. 

MC3 and MC5 follow from the DMC assumption in problem definition. Now for the rest consider the induced probability 
distribution by the given encoding and decoding scheme, 

M,A^,s^,s^,X",Y n ^,s^(m,a e ,s ,s e ,x ,y ,a d ,s d ) 
1 ™ 

= ~J^-{a™=A™(m)} \\ l{a d , i =A dti (y'~ 1 )}Ps(si)Ps c ,S d \S,A l! -A d (Se,i, S d ,i\Si, d e ,i, a d>i ) 
i—1 

n 

X il 1 PY\X,s{Vi\ X ii S i)- ( 131 ) 

»=1 

Averaging over (5f +1 , 5™,, X? , YP, S»., we get 

^M,A^,S i ,Si~ 1 ,X*- 1 ,Y*- 1 ,A i r 1 ,S i r 1 ( m ^ a e) s *> s e 1 J ^ ^ iV % 1 1 a d > s d ) 

1 i_1 

= P S (Si) X ^l{ »,A;(m)}l{ (1JiF A J ,,( ! ,-i)} I| 1 {a d , J =A d:J (yi-i)}Ps(s J )Ps c ,S d \S,Ae,A d ( S e,] S dj\Sj,aej,a d}: j) 

3=1 



(-1 

X 

3=1 

*7i rfi— 1 rn— 1 v*— 1 /li rti— 1 



II 1 {^, j =x c ,( m ,4)} p v|x,s(%l^-,^)) (132) 



$i($)*2(M, A r :,S l -\ ST\X l -\Y*-\A% S d - L ) (133) 
$i (5i , A dii ) $ 2 ( Aj,< , Ui , X 1 - 1 ) . ( 1 34) 



MCI. MC2 is straightforward as U contains A 7 ; 



Eq. ( 133 i implies A d> i is independent of Si while Eq. ( 134i implies markov chain (Ui,X l ) — A di i — Sj which in turn implies 
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\i.A'},S i ,Si,X i ,Y i - 1 ,Air 1 ,S i ~ 1 ( m ' a e ' S ' S e' X ' V > a rf ) s d ) 



— l{oJ=AJ(m)}l{x 4 =X i (m, s «)}l{a 4 ,i=A ti , i (3/*- 1 )} II -P5(Sj)-Ps c ,S d |S,A e ,A d (»e j 7 S<J ,j , a e j , a d j ) 



Now averaging over (S? +1 , X& V Y?, S^ +1 , A% i+1 ) in Eq. pi) we obtain 

= Ps(Si)Ps c ,S d \S,Ae,A d (Se,i,Sd,i\Si,a eil , Orf,i) 
1 

i-1 

X II 1 {a e , i =X e , i (m,4)} P >'|X,s(yikj,Si)l{ a<i!: ,. = A <i , ;f (j / i-l)} 
i = l 

= S e ,i, S d ,i,A e , h S eti )$ 2 (M, A", S l ~\S l ,T\X\Y l -\A l d , S^ 1 ) 

This implies the Markov Chain, (S* -1 ,.^) - (Ui,S ei i,A dli ) - {Si,A eii ,S dt i) which imphes MC4. 



(135) 

(136) 
(137) 



